- Be able to navigate R Studio and write reproducible code.
- Execute lines of code, as well as complete scripts.
- Identify variables, functions, and operators.
- Approach basic troubleshooting.
- Know how to download biodiversity data using Application Programming Interfacen (API).
Students will learn R basics while downloading biodiversity data from multiple data repositories. This module will walk students through installing R, navigating R,writing reproducible scripts in R, and using R to download biodiversity data.
Graph of R ratings overtime, rating is based on commonly used search engines, (x-axis is year, y-axis is rating (%)) shows the rating of R increasing over time.
R is a popular scripting language with readily available jobs (see jobs here: r-users.com)
R allows for reproducibility
R is free and open access
R is interdisciplinary
R can create beautiful figures
There are a lot of online resources for learning R. Throughout this activity we reference additional resources that may be useful. Below, we summarize the cited resources, as well as some additional references. We used many of these resources to create this activity.
R for cats and cat lovers
Tidyverse is a must-have suite of packages (defined in section 3.3) for data wrangling and analysis that includes many packages. Important :
- Install R and RStudio.
- Describe the purpose of the RStudio Screen: Source Script & document, Console, Environment & History, and Files-Veiewer Panes.
- Be able to open and start an R Project, and understand the purpose of the working directory.
Once you have R and R studio installed, open R Studio and start to get orientated.
The source is where you can edit “.R” files or scripts, which document the lines of code you are using for your project. There are many ways to run code written in the source panel to the console (see more in Section 3.4). Other files can be viewed and edited in this panel, including those in the image below.
The console is where you can run commands (or lines of code) and see any printed output.
The environment tab shows active objects, such as a data file you read into R (see function read.csv). The history tab shows any past commands that were run during the current R session.
The file tab shows any files in your working directory. The plots tab will show graphs and plots produced by running code. The packages tab will list all installed packages. Packages are units that contain a group of functions or commands for a specific purpose. For information about specific functions, you can search the help tab. The viewer can be used to visualize R Notebooks and R Markdowns (a fancy R script with markdown syntax, which make documents like this one).
RStudio projects set R working directories. This allows all files associated with a single project to be stored in one location.
A working directory is a reference point that indicates the path to the files needed in your code. You can think of a directory path as a list of directions for the computer to follow to find or save files related to your project.
A project allows a working directory to be defined for your project. Within the project folder, you may make a folder called scripts and/or data for these files to be stored and organized.
If you did not open a project in R, you may instead be using individual R scripts either within R Studio or R run on the command line. In these instances, you can check your working directory using getwd(). To set your working directory, you can use setwd("/path/to/working/directory").
Open R studio, then click “File” and “New Project…”
Choose “New Directory”
Choose “New Project”
For this example, the “Directory name:” was set as “Introduction2R”. Make sure you always name your project with a pharse that is meaningful and indicates the purpose of the project. “Create project as subdirectory of” was set to “~/Desktop/ClassFolder” - this shows where the project will be saved on your computer.
Congrats! You made your first R project!
Test yourself: You now should be able to answer Question 1 in the assessment.
In this pre-activity, we will go over some R Basics. At the end of this chapter, you will write your first R script!
- Define objects, operators, functions, and R packages.
- Use the built-in RStudio help interface to search for more information on R functions.
- Run commands in the console and from a script.
- Demonstrate how to provide sufficient information for troubleshooting with the R user community.
- Be able to write R scripts in a reproducible manner.
Below we summarize some R basics, but to find out more about R Scripts please review the link for some tips about:
Code Completion
Find and Replace
Extract Function
Comment/Uncomment
Executing Code
To open your first .R file or R Script, click the paper logo with the green plus sign OR “File” -> “New File” -> “R Script”.
R Scripts are used to document the code you use in your project. These scripts can be used as a reference for yourself and allow others to reproduce your analysis. Learn more about running code from R Scripts in section 3.4.
Naming R Scripts
Make sure you R Scripts file names are meaningful and end in .R.
Avoid using special characters in file names. Instead use numbers, letters, dashes (-), and underscores (_).
If files should be run in a particular order, prefix them with numbers. For example, 01_setup.R.
object: is what data is stored in your R environment. In the R programming language, this term is interchangeable with variable.
weight <- 3
class(weight)
## [1] "numeric"
operator: assignment operators assign a value to an object. Assignment operators may be a back arrow <- or an equal sign =.
<-. On a Mac, typing Option + - (push the keys at the same time) does the same.command: the complete line shown above can be called a command, here an object is assigned a value. This value can be directly assigned (eg. x <- 10) or calculated through a function (see above).
function: a function is a set of processes that you can apply to an object or group of variables. .
R has base functions, or functions that come with it, like round:
weight <- 3.4759875
round(weight)
## [1] 3
If you need help with a specific function, let’s say plot(), you can type:
?plot
You can also write your own functions. For example, if we have a weight in kilograms that we want to convert to pounds and we want to round the converted weight, we could write are own function to do this. Here the function we made is called convertkg2lb, this function will first convert kg to lb, then round this value, and finally return the rounded weight in lb. .
convertkg2lb <- function(weightkg){
weightlb <- weightkg*2.20462
rounded_weightlb <- round(weightlb)
return(rounded_weightlb)
}
weight <- 3.4759875
convertkg2lb(weight)
## [1] 8
This function may not be very useful to write when we only have one weight to convert, however if you need to convert and round 100 weights, you can reduce the number of lines of codes in half by using a function.
R package: is a group of functions. Offical R packages are avaliable via CRAN. For the class activity, we will be using packages from Tidyverse.
You can type commands directly into the console and press Enter to execute those commands. Sadly, commands run in the console will be forgotten when you close the session.
To document your code and allow for reproducibility, it is better to type the commands we want in the script editor, and save the script. RStudio allows you to execute commands directly from the script editor by using on PC the Ctrl + Enter shortcut, and on Macs, Cmd + Return.
Individual lines of code can also be run by clicking the “Run” button or you can run the whole script by clicking “Source”.
You can find other keyboard shortcuts in this RStudio cheatsheet about the RStudio IDE.
1. Read the error message
- For example, if your message reads “incorrect field specification”, check the function (?function) to make sure you specified the correct variables.
2. Google the error message
- Sometimes the error message will be confusing and reading it will not provide any insight. This is when google is helpful!
- If your error message is super generic, also include the name of the function or package when googling.
3. Ask for help
- If google did not answer your question, the next step would be to ask your classmates and/or instructor for help.
- Include the sessionInfo() - which prints the version of R, the packages loaded, and other useful information.
First start by ask your classmates and instructor.
Stack Overflow: Check out this awesome blog post on “How do I ask a good question?” before posting.
How to ask for R help: Make sure you are asking good questions!
Documenting steps in your work flow should be done in a way that is reproducible. There are many R style guides to guide how you annotate and organize your code - check out the tidyverse style guide
At the start of each .R script, your header should define the purpose of your code. This header should include the purpose of the script (why, what), as well as when the script has been updated. It should also include your name.
# Title of my document
## More information (why, what)
## Initial date: YYYY-MM-DD
## Update date: YYYY-MM-DD
## Name
I always start my scripts by loading packages (if the script includes any packages), this allows any lines to follow to use the functions in these packages.
# Load Packages
library(dplyr)
Each section should have a “#” hashtagged title that defines what the section includes. If a line in your script includes a hashtag, it is treated as text and not code. Following the initial title, you may want to include “##” followed by a description.
# List
## Create a list of characters
sites <- c("a", "b", "c", "d")
## Create a list of numbers
areas <- c(5, 12, 10, 11)
Below is a basic script. Open your own R script and write this script in your new file. Next, run each line.
- Write your first R script. Name this script “BasicScript.R”.
- Practice running lines of code.
- Be able to identify classes of objects.
- Identify what each function does and add comments to your script to indicate what each line does.
# Basic Script
## This is a basic script example
## 2020-10-02
## Name
# Objects
## Define objects of different classes
weight <- 3
class(weight) # numeric
weight <- 3L # integer
weight <- 3.5 # double
weight <- 3+2i # complex
hair <- TRUE # logical
hair <- "yellow" # character
hair = "brown"
# List
## Create a list of character
sites <- c("a", "b", "c", "d")
## Create a list of numbers
areas <- c(5, 12, 10, 11)
# Slice
### "give me a part of something"
### practice selecting from the lists
sites[1:3]
areas[3]
# Combining list
## c or combined
combine <- c(sites, areas)
combine
## rbind or row bind
combine_rbind <- rbind(sites, areas)
combine_rbind
## cbind or column bind
### Here you have a command surrounded by parentheses -
### What happens? Run this line to find out!
(combine_cbind <- cbind(sites, areas))
combine_cbind
# Dataframes
## making a data frame
(xy <- data.frame(sites, areas))
xy
## Explore a data frame
str(xy)
head(xy)
View(xy)
xy$areas
class(xy$areas)
length(xy$areas)
nrow(xy)
ncol(xy)
Test yourself: You now should be able to answer Question 2 - 4 in the assessment.
- Demonstrate how to download biodiversity data through an Application Programming Interface.
- Plot occurrence data on a simple map.
iDigBio, or the Integrated Digitized Biocollections, is a biodiversity aggregator. It currently holds over 125 million specimen records and over 40 million media records. These specimen include mostly plants and animals. While media records include mostly plant specimen.
Here is an example of a plant media record:
An API, or Application Programming Interface, allows a user to interact with a system that contains data. In this case, we are interacting with iDigBio, a biodiversity data aggregator. Even when we use the web portal for iDigBio, we are still interacting with the API. Here we will learn how to interact with the iDigBio API using R.
Spartina alterniflora, smooth cordgrass, grows along shorelines throughout both the Atlantic and Gulf coasts of North America. In its native range, it is used in ecosystem restoration due to its extensive rooting capabilities.